{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "86c9499c",
   "metadata": {},
   "source": [
    "### Note:\n",
    "In this directory, there are two examples using PDFs: **Example 3 - Plot PDF for a station**, and **Example 4 - Calculate PDFs from PSDs**.  These two examples are provided in order to highlight different ways that you can use the PDF and PSD values that ISPAQ generates.  \n",
    "\n",
    "To be specific, the difference between the two examples are:  \n",
    "\n",
    "#### Example 3 - Plot PDF for a station:  \n",
    "Example 3 uses PDFs that already exist in the ISPAQ example database. This means that they have been calculated using an ispaq.py command with the `--output db --db_name ispaq_example.db` option.  \n",
    "\n",
    "This is a great way to do it, especially if you plan to run the PSDs and PDFs at the same time, say on some sort of regular schedule.  In that case, you might as well calculate both in the same command and store them both in the ISPAQ database for later retrieval.  \n",
    "\n",
    "Additionally, we have tried to make it simple to calculate PDFs in ISPAQ for cases where you already have PSDs for the time span you are interested in.  For example, PDFs calculation does not require seismic data since it instead reads in existing PSDs.  That means that if you, the user, have been calculating daily PSDs for the past year, you don’t need to load a year’s worth of data to calculate a year-long PDF  - you can just use the existing PSDs! By calculating that year-long PDF using ISPAQ, it will be saved to either the database or the csv file and you will be able to retrieve it later.  \n",
    "\n",
    "\n",
    "#### Example 4 - Calculate PDFs from PSDs:  \n",
    "Example 4 will calculate PDFs _on the fly_, meaning that they do not need to exist in the ISPAQ metric database, nor will they be saved to the ISPAQ metric database.   \n",
    "\n",
    "Why would you want to do this if you can simply use an ispaq.py command to calculate and save the PDFs in the database?  Here are a couple possible reasons: \n",
    "1) You may want to calculate PDFs on an arbitrary timeframe but don't feel the need to save the PDF values, say if you are just poking around at or investigating changes in the data and don't want to clutter the database.   \n",
    "\n",
    "2) To prevent the ISPAQ database from growing too complicated, the pdf table in the ISPAQ database is very simple and PDFs values are stored with the start and end times used to calculate that particular PDF.  If you calculate daily PDFs for a week and then additionally calculate a week-long PDF, the database will store 8 PDFs - one for each day in the week, and one that spans the entire week. This means that, even if you have used ISPAQ to calculate your arbitrary time frame, you must know the specific start and end times of the PDF that you are looking to retrieve. If you look for a time range using less-than and greater-than (<>) instead of equals-to (==) then you risk retrieving multiple PDFs, including ones that you did not intend. By using this on-the-fly method, you bypass this risk since PSDs are stored by the individual PSD (usually an hour span, can vary depending on the sample rate of the data), and only those PSDs that are needed to calculate the PDF are retrieved. \n",
    "\n",
    "  \n",
    "  \n",
    "*Both methods are valid and can be useful in different situations.*"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "standard-mirror",
   "metadata": {},
   "source": [
    "## Example 4 - Calculate PDFs from PSDs\n",
    "The intent of this series of Jupyter Notebooks is to demonstrate how metrics can be retrieved from the ISPAQ example sqlite database and provide some ideas on how to use or plot those metrics.  \n",
    "\n",
    "In this example, we will use the pdf_aggregator.py script to calculate PDFs on-the-fly.  The calculate_pdfs method in the pdf_aggregator will look for existing PSDs, calculate the PDFs, and then return the PDF values in a dataframe.  It also returns dataframes of the modes, minimums, and maximums. \n",
    "\n",
    "To generate PDFs in this example, corrected PSD values must already exist. If they do not yet exist, then you can run them via:\n",
    "\n",
    "    ./run_ispaq.py -M psd_corrected -S ANMO --starttime 2020-10-01 --endtime 2020-10-16 --output db --db_name ispaq_example.db\n",
    "\n",
    "\n",
    "This example is slightly more complicated because it is tying into the ISPAQ code directly. That means that we need to import multiple functions from various ISPAQ scripts, and we need the proper arguments available for each of those. \n",
    "\n",
    "First we import the simple ones:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "harmful-sterling",
   "metadata": {},
   "outputs": [],
   "source": [
    "import sys\n",
    "import os\n",
    "import logging\n",
    "import pandas as pd\n",
    "from obspy import UTCDateTime"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "acceptable-antique",
   "metadata": {},
   "source": [
    "Now we move onto the ISPAQ-specific ones. Because of the directory structure and where this example lives, we need to add the main ispaq directory to our path. Then we will be able to import the ISPAQ modules. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fresh-unemployment",
   "metadata": {},
   "outputs": [],
   "source": [
    "path_parent = os.path.dirname(os.getcwd())\n",
    "sys.path.insert(1, f'{path_parent}/ispaq/')\n",
    "\n",
    "\n",
    "import concierge\n",
    "from user_request import UserRequest\n",
    "import PDF_aggregator"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "amino-cursor",
   "metadata": {},
   "source": [
    "With the modules imported, we now need to set up some variables that will be required to run the ISPAQ code. This includes a logger and an arguments class that contains the fields from the preference file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "express-prospect",
   "metadata": {},
   "outputs": [],
   "source": [
    "logger = logging.getLogger(__name__)\n",
    "logger.setLevel(logging.DEBUG)\n",
    "formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s', datefmt='%Y-%m-%d %H:%M:%S')\n",
    "\n",
    "class args:\n",
    "    db_name = 'ispaq_example.db'\n",
    "    starttime = UTCDateTime('2020-10-01')\n",
    "    endtime = UTCDateTime('2021-10-15')\n",
    "    metrics = 'pdf'\n",
    "    stations = 'IU.ANMO.00.BHZ.M'  # The \"stations\" must refer to a single target, including the quality code (N.S.L.C.Q)\n",
    "    preferences_file = f'{path_parent}/preference_files/default.txt'\n",
    "    station_url = 'IRIS'\n",
    "    dataselect_url = 'IRIS'\n",
    "    event_url = 'IRIS'\n",
    "    resp_dir = ''\n",
    "    output = 'db'\n",
    "    csv_dir = f'{path_parent}/csv/'\n",
    "    sncl_format = 'N.S.L.C.'\n",
    "    sigfigs = 6\n",
    "    pdf_type = 'plot'\n",
    "    pdf_interval = 'aggregated'\n",
    "    plot_include = ''\n",
    "    pdf_dir = f'{path_parent}/pdfs/'\n",
    "    psd_dir = f'{path_parent}/psds/'\n",
    "    \n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "under-cutting",
   "metadata": {},
   "source": [
    "Those will now be used to create a userRequest, which will then be used to create a concierge object. The concierge object will need to later be passed into the method that actually calculates the PDFs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "respiratory-knowing",
   "metadata": {},
   "outputs": [],
   "source": [
    "user_request = UserRequest(args, logger=logger)\n",
    "concierge = concierge.Concierge(user_request, logger=logger)\n",
    "print(concierge, logger)\n",
    "print(concierge.logger)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "toxic-timeline",
   "metadata": {},
   "source": [
    "Now that we've handled that, we can calculate the PDFs. First, we move into the directory that contains the database, since that's where the ISPAQ code expects us to be. Then we call on the calculate_PDF method, which will return dataframes that contain: PDF values, modes, maximums, and minimums. This may take a few minutes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "entertaining-profile",
   "metadata": {},
   "outputs": [],
   "source": [
    "os.chdir(path_parent)\n",
    "[pdfDF,modesDF, maxDF, minDF] = PDF_aggregator.calculate_PDF(pd.DataFrame(), args.stations, args.starttime, args.endtime, concierge)\n",
    "print(pdfDF)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "utility-kazakhstan",
   "metadata": {},
   "source": [
    "With it in a dataframe, you can now do what you want with it! Manipulate it how you want.  \n",
    "\n",
    "Below I call on the plot_PDF function to plot it up and save the figure to the pdf_dir specified above. The plot will show up below when done."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "technological-malaysia",
   "metadata": {},
   "outputs": [],
   "source": [
    "PDF_aggregator.plot_PDF(args.stations, args.starttime, args.endtime, pdfDF, modesDF, maxDF, minDF, concierge)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
